-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coldata: fix Bytes invariant in some cases #59028
Conversation
cbf99c3
to
9be814e
Compare
9be814e
to
10e6091
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were not able to reproduce this? I assume the unit test is a very specific reproduction, though, right?
Reviewed 4 of 4 files at r1, 4 of 4 files at r2, 1 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/col/coldata/vec_tmpl.go, line 81 at r2 (raw file):
// Note that here we rely on the fact that selection vectors are // increasing sequences. fromCol.UpdateOffsetsToBeNonDecreasing(sel[len(sel)-1] + 1)
Can we have a non-nil zero-length selection vector? Maybe it makes sense to update the earlier if case to len(args.Sel) == 0
pkg/sql/colexec/spilling_queue.go, line 242 at r3 (raw file):
if q.numInMemoryItems > 0 { // If we have already enqueued at least one batch, let's try to copy // as many tuples into it as it has the capacity for.
Could you add a unit test that would tickle this bug for regression purposes?
10e6091
to
a5c6962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not able to find a repro for the bytes invariant problem (I think it could happen during Vec.Copy
with SelOnDest
set to true
which is used only by the CASE operator, but we don't support CASE with Bytes output type).
Still I cannot persuade myself that the scenario mentioned in the unit test will never occur in the production, and updating the offsets up to the largest index mentioned in the selection vector has always been the intention in SetLength
(as evidenced by the similar code in Append
when Sel
is non-nil), so I believe it is worth merging the fix and - possibly - backporting to previous releases.
Update: actually, this issue might be the root cause of #57297, so I'm even more convinced that we should merge and backport the fix.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/col/coldata/vec_tmpl.go, line 81 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Can we have a non-nil zero-length selection vector? Maybe it makes sense to update the earlier if case to
len(args.Sel) == 0
Good point. This currently can never occur because we never attempt to append 0 values. I've updated the contract of Append
and left some comments in other places to highlight that.
I slightly prefer doing it this way (other than checking for the case whether we are trying to append 0 values and having some custom behavior there) since it would show up as an internal error to indicate that the assumptions are violated elsewhere. Let me know if you would prefer to be on the safe side.
pkg/sql/colexec/spilling_queue.go, line 242 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Could you add a unit test that would tickle this bug for regression purposes?
Opened #59077.
`execgen.SLICE` directive is used only in one place, and we can use `execgen.WINDOW` there instead (which will have the same effect). Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
In `SetLength` method we are maintaining the invariant of `Bytes` vectors that the offsets are non-decreasing sequences. Previously, this was done incorrectly when a selection vector is present on the batch which could lead to out of bounds errors (caught by our panic-catcher) some time later. This is now fixed by correctly paying attention to the selection vector. I neither can easily come up with an example query that would trigger this condition nor can I prove that it won't occur, but I think we have seen a single sentry report that could be explained by this bug, so I think it's worth backporting. Additionally, this commit uses the assumption that the selection vectors are increasing sequences in order to calculate the largest index accessed by the batch. Release note (bug fix): Previously, CockroachDB could encounter an internal error when executing queries with BYTES or STRING types via the vectorized engine in rare circumstances, and now this is fixed.
a5c6962
to
3589b52
Compare
TFTR! bors r+ |
Build succeeded: |
execgen: remove SLICE method
execgen.SLICE
directive is used only in one place, and we can useexecgen.WINDOW
there instead (which will have the same effect).Release note: None
coldata: fix updating offsets of bytes in Batch.SetLength
In
SetLength
method we are maintaining the invariant ofBytes
vectors that the offsets are non-decreasing sequences. Previously, this
was done incorrectly when a selection vector is present on the batch
which could lead to out of bounds errors (caught by our panic-catcher)
some time later. This is now fixed by correctly paying attention to the
selection vector.
I neither can easily come up with an example query that would trigger
this condition nor can I prove that it won't occur, but I think we have
seen a single sentry report that could be explained by this bug, so I
think it's worth backporting.
Additionally, this commit uses the assumption that the selection vectors
are increasing sequences in order to calculate the largest index
accessed by the batch.
Fixes: #57297.
Release note (bug fix): Previously, CockroachDB could encounter an
internal error when executing queries with BYTES or STRING types via the
vectorized engine in rare circumstances, and now this is fixed.